On-line Policy Improvement using Monte-Carlo Search

نویسندگان

Gerald Tesauro

Gregory R. Galperin

چکیده

Gregory R. Galperin MIT AI Lab 545 Technology Square Cambridge, MA 02139 We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP! and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Loss of Load Expectation Assessment in Deregulated Power Systems Using Monte Carlo Simulation and Intelligent Systems

Deregulation policy has caused some changes in the concepts of power systems reliability assessment and enhancement. In this paper, generation reliability is considered, and a method for its assessment using intelligent systems is proposed. Also, because of power market and generators’ forced outages stochastic behavior, Monte Carlo Simulation is used for reliability evaluation. Generation r...

متن کامل

Parallel Non-Stationary Direct Policy Search for Risk Averse Stochastic Optimization

This paper presents an algorithmic strategy to non-stationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time-variant policy parameters by a non-parametric response surface model for an indirect parametrized policy motivated by Bellman’...

متن کامل

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...

متن کامل

Using Monte Carlo Search with Data Aggregation to Improve Robot Soccer Policies

RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By...

متن کامل

ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

Learning to walk over a graph towards a target node for a given input query and a source node is an important problem in applications such as knowledge graph reasoning. It can be formulated as a reinforcement learning (RL) problem that has a known state transition model, but with partial observability and sparse reward. To overcome these challenges, we develop a graph walking agent called Reinf...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

On-line Policy Improvement using Monte-Carlo Search

نویسندگان

چکیده

منابع مشابه

Loss of Load Expectation Assessment in Deregulated Power Systems Using Monte Carlo Simulation and Intelligent Systems

Parallel Non-Stationary Direct Policy Search for Risk Averse Stochastic Optimization

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Using Monte Carlo Search with Data Aggregation to Improve Robot Soccer Policies

ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

عنوان ژورنال:

اشتراک گذاری